Machine Learning Engineer | PyTorch | Huggingface Transformers | Machine Learning | Genomics | London, UK
Safeguarding the future of food
We’re on a mission to accelerate the development of more productive, sustainable, nutritious &
climate-resilient food sources.
How? By building the world’s first ML-driven target discovery platform for crop gene-editing.
From drug target discovery to crop target discovery
While gene-editing of crops is becoming ever more efficient, identifying which genes to edit and how remains a significant challenge.
To overcome this bottleneck, we use cutting-edge deep learning to accurately and efficiently identify high value genetic targets for crop gene-editing.
Our approach draws inspiration from recent advancements in the drug discovery space, incorporating LLMs, transformers and graph-based technologies to build a best-in-class discovery platform for plant sciences.
Team
Led by 2 co-founders, we are now are a team of 12 including 2 ML engineers, 2 data engineer’s, 3 bioinformaticians. We also have a remote, part-time intern conducting ML research. We primarily work together in-person from our office in Spitalfields, London, 4 days per week.
Position
Working within our core ML team, you’ll be helping us build best-in class genomic foundation models tailored to the plant sciences. This could involve everything from model training to data curation to evals but we welcome applicants with specific expertise who feel they could uniquely contribute to a specific element of the training lifecycle of large complex models.
The applicant should have explicit experience making use of genomic data in a machine learning context. We are particularly interested in applicants with experience working with foundational generative models of DNA or transcriptomic data. However, our modelling efforts have a strong focus on multi-modality so any experience with or interest in other data modalities (e.g. text) is a plus.
Core Responsibilities
• Contribution to the development of our core proprietary -omics models including model training & eval development.
• Recreation of SOTA models from the scientific literature & benchmarking against our internal models & evals.
Additional / Development Areas
• Model deployment ensuring flexible & scalable inference access to our wider Data Science team.
• Collaboration with our bioinformatics team to ingest, standardise and QC data from multiple sources (internal & external) to prepare it for ingestion into our training pipelines.
• Supporting our wider ML team on additional model development and commercial projects.
Competencies
Core
• Postgraduate experience (MSc or PhD) studying ML with demonstrable application to a biological domain.
• Demonstrable experience building modern architectures (transformers, diffusers, etc.) from scratch and applying them to real biological datasets
• Experience working with large-scale transcriptomic atlases
o Preferably with experience in non-human organisms (although this is not a requirement)
• Experience working with PyTorch, huggingface transformers & diffusers
• Experience working with ML accelerators
Nice-to-have
• Relevant publications in reputable journals/venues or contributions to open-source projects
• Demonstrable exposure to and interest in probabilistic ML, causal ML or active learning
• Experience with distributed model training (data and model parallelism)
• Experience work on biological data curation, with exposure to data cleansing & pre-processing problems with -omics datasets.
• Exposure to cloud based ML orchestration frameworks such as Sagemaker and Vertex AI.
• Model deployment in an enterprise setting.
Benefits
• Competitive salary & equity options
• 25 days annual leave & option for 2 weeks work from anywhere policy
• Benefits package
• Career development opportunities as the company scales
• Ownership of ambitious, mission-driven work with real-world impact
• Vibrant, innovative & supportive work environment with a committed team
• Access to conferences, events & professional development resources
Machine Learning Engineer | PyTorch | Huggingface Transformers | Machine Learning | Genomics | London, UK